Robust algorithms for speech reconstruction on mobile devices
نویسنده
چکیده
This thesis is concerned with reconstructing an intelligible time-domain speech signal from speech recognition features, such as Mel-frequency cepstral coefficients (MFCCs), in a distributed speech recognition(DSR) environment. The initial reconstruction methods in this thesis require, in addition to MFCC vectors, fundamental frequency and voicing information. In the later parts of the thesis these parameters are predicted from the MFCC vectors. Speech reconstruction is achieved by first estimating a spectral envelope from the MFCC vectors. This is combined with excitation information from the fundamental frequency and voicing which enables both a source-filter and sinusoidal model of speech to be investigated and compared. Analysis into the sinusoidal model shows that both clean spectral envelope estimates and robust fundamental frequency estimates are necessary for clean speech reconstruction in noisy environments. Inclusion of spectral subtraction is shown to provide the clean spectral envelope estimates. A comparison of fundamental frequency estimation methods shows the most robust to be obtained from an auditory model. This leads to the proposal of an integrated front-end which replaces the mel-filterbank by the auditory filterbank for MFCC extraction and thereby reduces computation. Speech reconstruction tests reveal that robust fundamental frequency estimation and spectral subtraction lead to intelligible and relatively noise free speech. Evidence in this work has shown that correlation exists between fundamental frequency and the MFCC vectors. This leads to the proposal of predicting the fundamental frequency and voicing of a frame of speech from its MFCC representation. An initial method uses a Gaussian mixture model (GMM) to model the joint density of fundamental frequency and MFCCs. A second method combines the GMMs into a hidden Markov model framework to give a more localized modeling of the joint density. Experimental results on both speaker-dependent and speaker-independent tasks show accurate prediction of the fundamental frequency and voicing from the MFCC vectors which leads to intelligible speech reconstruction.
منابع مشابه
A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملAn efficient framework for robust mobile speech recognition services
A distributed framework for implementing automatic speech recognition (ASR) services on wireless mobile devices is presented. The framework is shown to scale easily to support a large number of mobile users connected over a wireless network and degrade gracefully under peak loads. The importance of using robust acoustic modeling techniques is demonstrated for situations when the use of speciali...
متن کاملTimo Gerkmann , Martin Krawczyk - Becker , and Jonathan Le Roux ] [ History and recent advances ] Phase Processing for Single - Channel Speech Enhancement
Date of publication: 12 February 2015 ith the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for challengi...
متن کاملChallenges and opportunities for interaction on mobile devices
While there are opportunities to using mobile devices, the challenges of exploiting mobile applications to the full involve the combination of several different technologies. Canon Research Centre Europe (CRE) is performing research and development in technologies that address these challenges, most notably: robust embedded speech recognition, multimodal interaction, and mobile information acce...
متن کاملEnhancement of noisy speech for noise robust front-end and speech reconstruction at back-end of DSR system
This paper presents a speech enhancement method for noise robust front-end and speech reconstruction at the back-end of Distributed Speech Recognition (DSR). The speech noise removal algorithm is based on a two stage noise filtering LSAHT by log spectral amplitude speech estimator (LSA) and harmonic tunneling (HT) prior to feature extraction. The noise reduced features are transmitted with some...
متن کامل